Interval iteration algorithm for MDPs and IMDPs

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Policy Iteration Algorithm for Partially Observable MDPs

A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more eecient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simpliication is representation of a policy as a nite-state controller. This representation makes policy evaluation straightforward. The pa-per's contribution is to show that the dynamic-progra...

متن کامل

Policy Iteration for Relational MDPs

Relational Markov Decision Processes are a useful abstraction for complex reinforcement learning problems and stochastic planning problems. Recent work developed representation schemes and algorithms for planning in such problems using the value iteration algorithm. However, exact versions of more complex algorithms, including policy iteration, have not been developed or analyzed. The paper inv...

متن کامل

Policy Iteration for Factored MDPs

Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not re­ tain the structure of the process, recent work has suggested that value functions in factored MDPs can often be approximated well using a factored value function: a linear combination of restr icted basis functions, each of which refers only to a small subset ...

متن کامل

Polynomial Value Iteration Algorithms for Detrerminstic MDPs

Value iteration is a commonly used and em­ pirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo­ polynomial complexity in general. We estab­ lish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges...

متن کامل

Polynomial Value Iteration Algorithms for Deterministic MDPs

Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudopolynomial complexity in general. We establish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges to th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Theoretical Computer Science

سال: 2018

ISSN: 0304-3975

DOI: 10.1016/j.tcs.2016.12.003